Parallel and Distributed Data Pipelining with Knime

نویسندگان

  • C. Sieb
  • T. Meinl
  • M. R. Berthold
چکیده

In recent years a new category of data analysis applications have evolved, known as data pipelining tools, which enable even nonexperts to perform complex analysis tasks on potentially huge amounts of data. Due to the complex and computing intensive analysis processes and methods used, it is often neither sufficient nor possible to simply rely on the increase of performance of single processors. Promising solutions to this problem are parallel and distributed approaches that can accelerate the analysis process. In this paper we discuss the parallel and distribution potential of pipelining tools by demonstrating several parallel and distributed implementations in the open source pipelining platform KNIME. We verify the practical applicability in a number of real world experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields

This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...

متن کامل

Extending KNIME for next-generation sequencing data analysis

SUMMARY KNIME (Konstanz Information Miner) is a user-friendly and comprehensive open-source data integration, processing, analysis and exploration platform. We present here new functionality and workflows that open the door to performing next-generation sequencing analysis using the KNIME framework. AVAILABILITY All sources and compiled code are available via the KNIME update mechanism. Examp...

متن کامل

Data Dependence Boundary Row Boundary Row Node

Though more diicult to program, distributed-memory parallel machines provide greater scalability than their shared-memory counterparts. Distributed Shared Memory (DSM) systems provide the abstraction of shared memory on a distributed machine. While DSMs provide an attractive programming model, they currently can not eeciently support all classes of scientiic applications. One such class are tho...

متن کامل

From the Desktop to the Grid and Cloud: Conversion of KNIME Workflows to WS-PGRADE

Computational analyses for research usually consist of a complicated orchestration of data flows, software libraries, visualization, selection of adequate parameters, etc. Structuring these complex activities into a collaboration of simple, reproducible and well defined tasks brings down complexity and increases reproducibility. This is the basic notion of workflows. Workflow engines allow user...

متن کامل

Techniques for Compiling Programs on Distributed Memory Multicomputers

It is widely accepted that distributed memory parallel computers will play an important role in solving computation-intensive problems. However, the design of an algorithm in a distributed memory system is time-consuming and error-prone, because a programmer is forced to manage both parallelism and communication. In this paper, we present techniques for compiling programs on distributed memory ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002